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Abstract; Species delimitation is one of the most fundamental issues in biology and has recently drawn significant 
interest. A main reason for the increasing interests was the barcoding initiative associated rapid development of mo- 
lecular techniques. One of the most important principles to diagnose species or species groups is to choose appropriate 
markers. However, incomplete linkage sorting and introgression, which are widespread phenomena in plants, present 
major obstacles in species delimitation. Recently, significant progress in our understanding of gene flow dependent in- 
trogression and species delimitation has been made both theoretically and empirically. In this paper, we reviewed the 
gene flow mediated speciation; evaluated the difference of introgression and incomplete linkage sorting; and finally 
concluded that species delimitation should be more effective with markers experiencing high levels of gene flow. 
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Gene flow mediated speciation 

Speciation refers to the evolution in reproduc- 
tive barriers (as well as phenotypic, behavioral and 
genetic differences) between populations, eventually 
leading to distinct species (Coyne and Orr 1997; 
Rieseberg et al., 2006 and references therein ). 





However, the mechanisms by which species are 
formed remain incompletely understood and the topic 
of intense research and debates. One major reason 
for the continued debates relates to different opinion 
on species concepts ( Wiens, 2004). 


Species concepts have played a major role in 
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evolutionary biology during the past 250 years, as 
summarized by De Queiroz (2007) in his review pa- 
per “Ernst Mayr and the modern concept of spe- 
cies”. The term “species” originally came from the 
Latin word for “kind” and its use has been made 
more precise following the work of Carolus Linnaeus 
(1753 and 1758). In this view, species is “the 
basic unit of biological classification” (Flexner and 
Hauck, 1993). A major shift took place after the 
seminal publication of Charles Darwin’ s On the Ori- 
gin of Species (1859). The modern evolutionary 
synthesis provided the foundation for systematics and 
evolutionary biology ( Dobzhansky, 1937; Mayr, 
1942). Ernst Mayr, the coiner of the biological spe- 
cies concept (BSC), proposed that species are 
‘ groups of actually or potentially interbreeding natu- 
ral populations, which are reproductively isolated 
from other such groups’. This definition presents two 
characteristics of species: (i) that they have a con- 
tinuous gene pool (i.e. all individuals can, or have 
the potential to, interbreed) and (ii) , that the indi- 
viduals of a species are reproductively isolated from 
individuals of other species ( Mayr 1963; Niklas 
1997). This definition suggests that interspecific 
gene flow should be low. Although the BSC is widely 
accepted by zoologists, lots of botanists prefer using 
morphological characters as main features to diag- 
nose species and criticized the use of reproductive 
barrier for species delineation because such a barrier 
is not as strict in plants as in animals (e. g. many 
plant taxa are potentially interfertile or parthenogeni- 
si is common in plants). In addition, Mayr pointed 
out in further statements later that ‘the steady and 
high genetic input caused by gene flow is the main fac- 
tor responsible for genetic cohesion among the popula- 
tions of a species’ (Mayr, 1963). This later argu- 
ment emphasizes the importance of intraspecific gene 
flow, which unites all individuals and populations of 
one species together. While most researchers agreed 
that interspecific gene flow ought to be limited to 
keep species distinct, arguments were made against 


the importance of intraspecific gene flow, as species 


cohesion did not seem to be always dependent on in- 
traspecific gene flow (Ehrlich and Raven, 1969). 
Recently Morjan and Rieseberg (2004) demon- 
strated that intraspecific gene flow, even when limit- 
ed, is essential to keep species genetically coherent, 
while acknowledging that other evolutionary forces 
such as selection are also important in maintaining 
species cohesion. Both components, inter- and in- 
traspecific gene flow, therefore lie at the root of the 
biological species concept (Mayden, 1997). How- 
ever, their interaction and the way they affect spe- 
cies delimitation have drawn attention only very re- 
cently (Petit and Excoiffer 2009; Du et al., 2011) 
Species delimitation has involved many methods 
to identify the actual boundaries of species and at the 
same time determine the number of species ( De 
Queiroz, 2007). Delimiting species has traditionally 
relied on morphological characters supplemented 
with geographic and ecological information ( Briggs 
and Walters, 1997). Interest for species delimita- 
tion has fluctuated through time: it drew a lot of at- 
tention in the middle of the last century, thanks to 
the emergence of modern systematics ( Sites and 
Marshall, 2004.) , and now experiences a phase of 
renaissance thanks to the rapid development of mo- 
lecular technologies ( Wiens, 2004). In particular, 
DNA barcoding (the use of a short standardized 
DNA sequence to identify and discover species) has 
attracted much attention ( Hebert et al., 2003, 
2004; Hollingsworth et al., 2011; Li et al., 2011). 
A key point for DNA barcoding and more generally 
for species delimitation is to choose a “good” marker 


showing species-specific variation. 


Incomplete lineage sorting and introgression 

However, numerous studies have revealed shared 
DNA polymorphisms between closely related spe- 
cies. This situation can be caused by two main rea- 
sons: (1) retention of ancestral polymorphisms, 
caused by incomplete lineage sorting (also called 
sharing of ancestral polymorphisms) during and fol- 


lowing speciation (Heckman et al., 2007; Willyard 
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et al., 2009 and references therein) ; (2) introgres- 
sion, caused by genetic exchange after secondary 
contact between two previously geographically sepa- 
rated species (Liston et al., 1999; Gay et al., 
2007). Distinguishing these two mechanisms is dif- 
ficult; the most common approach is to use coales- 
cent modeling to compare divergence time and an- 
cestral population sizes of the two species. Several 
studies have used this approach (Hey, 2001; Ran- 
nala and Yang, 2003; Smith and Farrell, 2005; 
Burgess and Yang, 2008; Joly et al., 2009). How- 
ever, for closely related species that have diverged 
very recently, this type of approach seems to have 
limited utility, unless spatial information is taken in- 
to account (McGuire et al., 2007). Incomplete lin- 
eage sorting and introgression can be differentiated 
by studying the geographic variation and demograph- 
ic history of the species using molecular markers. If 
shared polymorphisms are randomly distributed , 
then retention of ancestral polymorphisms might be 
involved. However, if shared haplotypes occur only 
in sympatric populations, then introgression is more 


likely (Fig. 1). 
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Fig. 1 Possible interpretations of the mechanisms underlying cases of 
shared haplotypes between species; If shared haplotypes are randomly 
distributed, incomplete lineage sorting is more likely (top left). 

If introgression is involved, following contact between species , 
shared haplotypes will be restricted to the areas of sympatry 
(bottom left). In the illustration, only the distribution of 


haplotypes of species 2 is illustrated 


Introgression is a widespread phenomenon with 
potentially profound evolutionary consequences ( An- 
derson and Hubricht, 1938; Anderson, 1953; Ander- 
son and Stebbins, 1954; Rieseberg and Brunsfeld, 
1992). Edgar Anderson defined introgression as the 
process of infiltration of genes from one species to 
another through regular mating events involving 
backcrosses with one of the parental species. The in- 
trogression process can be divided into three steps; 
generation of F1 hybrids; backcrossing with one or 
both of the parents; incorporation of this new genetic 
variation into the genome of the backcrossing spe- 
cies, possibly following screening by natural selec- 
tion ( Anderson and Hubricht, 1938; Anderson, 
1953; Anderson and Stebbins, 1954). Indeed, in- 
dividuals with introgressed genetic materials can se- 
lectively retain (or “filter” ) advantageous genes, 
while disadvantageous genes can be eliminated by 
purifying selection (Key, 1968; Harrison, 1986). 
The possibilities offered by introgression were real- 
ized early on by breeders willing to incorporate in a 
domesticated species a given attribute of a wild rela- 
tive (Bessey, 1906; Gur and Zamir, 2004). Intro- 
gression processes may differ for different genomes. 
In the nuclear genome, the F1 hybrid gets 50% of 
the genes from each parent, and the proportion of 
additional introgressed genetic material is halved af- 
ter every generation of backcrossing. For uniparen- 
tally inherited genomes, the situation is strikingly 
different. Each F1 hybrid receives a complete unal- 
tered version of the genome from one of its parent, 
so that there is no dilution of the contribution of the 
donor species or population after several generations 
of backcrossing (Fig. 2). The formation of introgres- 
sion can be very quick (several generations), in 
contrast, incomplete lineage sorting typically repre- 
sents much more ancient events. 

Note that genetic drift is greatly reduced in a 
subdivided population compared to a single random 
mating population of similar census size ( Wright, 
1943; Gilpin, 1991). Hence, under the condition of 


similar population size reduced intraspecific gene flow, 
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which implies increased genetic drift and hence in- 
creased subdivision, means that sorting of ancestral 


variation will take longer, making species diagnosis 
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more challenging. Thus, species delimitation should 
be easier if it were based on molecular markers expe- 


riencing high rates of gene flow (Fig.3). 
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Fig. 2 Illustration of the introgression process for three different genomes; Biparentally inherited nuclear genome (left) ; ma- 


ternally inherited organelle genome (middle) and paternally inherited organelle genome (right). The red circles represent 


genes with maternal ancestry and the blue circles represent genes with paternal ancestry in the offspring. For the nuclear genome 


(left) the genetic material inherited from the donor parent is reduced to 1/32 after four generations of backcrossing. For the 


maternally inherited genomes, the genetic material from the father does not contribute at all (middle). In contrast, for a pater- 


nally inherited genome, the offspring retains the entire genome from the father, there is no dilution effect (right) 
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Fig.3 Species delimitation ability at two markers experiencing contrasted rates of gene flow under the condition of similar population size. 


The marker with lower levels of gene flow ( marker 1) has lower taxonomic resolution than the marker with higher gene flow ( marker 2) 


Recent progress on gene flow dependant 
introgression and species delimitation 

To date, many empirical studies have already 
demonstrated that incomplete linkage sorting is wide- 
spread in plants ( Du et al., 2009; Zhou et al., 
2010; Wang et al., 20lla; Palma-Silva et al., 
2011), however, studies on introgression in plant is 
lacking ( but see Arnold et al., 2010). Recently, 
significant progress in our understanding of introgres- 


sion has been made with the development of a neu- 


tral demo-genetic model ( Currat et al. , 2008). This 
model predicts that, when one species invades an ar- 
ea already occupied by a related species, introgres- 
sion of neutral genes takes place mainly from the 
native species towards the invading species. In addi- 
tion, following contacts between two hybridizing 
species, the model predicts that introgression should 
be particularly frequent for genome components ex- 
periencing little gene flow. In line with this neutral 


model, Petit and Excoffier (2009) suggested that 
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markers experiencing high rates of gene flow should 
be better suited for species delimitation than those 
experiencing low rates of gene flow, in part because 
high rates of intra- specific gene flow can prevent in- 
trogression. 

Empirical studies also support the above predic- 
tions in both gymnosperm and angiosperm plants. 
Du et al. (2011) use molecular markers from two or- 
ganelle genomes (mtDNA and cpDNA) with con- 
trasting rates of gene flow to examine genetic exchan- 
ges between two morphologically distinct spruce Pi- 
cea species growing in the Qinghai-Tibetan Plateau. 
They found that all sympatric populations of the ex- 
panding species had received their maternally inheri- 
ted mitochondrial DNA (mtDNA) (transferred by 
seed, low gene flow) from the resident species, 
whereas for paternally inherited chloroplast ( cpD- 
NA) (transferred by pollen and seeds, high gene 
flow ) introgression is more limited and not strictly u- 
nidirectional (See their schematic model in Fig. 1 of 
Du et al., 2011). In angiosperm plants, however, 
after comparative analysis of a large dataset on both 
chloroplast DNA (rbcL, matK and trnH-psbA) and 
nuclear internal transcribed spacer (ITS) , The China 
Plant BOL Group et al. (2011) discovered that the 
later performed relatively well in angiosperm plant 
species delimitation. This conclusion based on a large 
dataset represents another step forward towards rou- 
tine use of DNA barcoding ( Hollingsworth et al., 
2011) as well as using markers with fast rate of gene 


flow to diagnose species ( Wang et al., 2011b). 


Conclusion 

Recent progress on both theoretical and empiri- 
cal studies suggested that the important role of gene 
flow should not be ignored no matter the study focus 
is on the species demographic history or diagnostics. 
If the studies were focusing on revealing the phylo- 
geographic history of the species then the markers 
with low rate of gene flow should be used, i. e. 
mtDNA in gymnosperm or organic (cp) DNA in an- 


giosperm. However, if the studies were designed to 


delimitate related species or species groups ( barcod- 
ing for example) , then the markers with fast rate of 
gene flow should be chosen i. e. cpDNA in gymno- 


sperm or nuclear DNA in angiosperms. 
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